Gene cluster of pederin biosynthesis genes

ABSTRACT

The present invention relates to the cloning, sequencing and analysing of a gene cluster encoding a modular polyketide synthase enzyme involved in the biosynthesis of the antitumor compound pederin. This novel cluster represents the first example of genes from an unculturable symbiont encoding the biosynthesis of a drug candidate.

The present invention relates to the cloning, sequencing and analysing of a gene cluster encoding a modular polyketide synthase enzyme involved in the biosynthesis of the antitumor compound pederin.

In particular, the present invention relates to a novel isolated nucleic acid comprising a pederin biosynthetic gene cluster, fragments of this gene cluster corresponding polypeptides vectors and recombinant host cells or transgenic organisms comprising said nucleic acids, a method for producing pederin and the use of said nucleic acids for preparing a modified pederin biosyntheisis synthesis gene cluster or modified pederin molecules.

Thus, the invention relates to novel genes and isolated nucleic acids encoding polypeptide/proteins exhibiting functional activities involved in the pederin biosynthesis, such polypetides themselves, and methods or uses prepare pederin or modified pederin derivatives.

Invertebrates, particularly those from marine environments are an important source of natural products with high therapeutic potential. The low availability of most of these metabolities however, represents a serious impediment to drug development. As many invertebrates are difficult to cultivate and chemical, synthesis is usually not economical, alternative and ecologically friendly sources of natural products are urgently needed. The actual producers of many drug candidates isolated from invertebrates may well be symbiotic bacteria, but so far no producing symbiont has ever been successfully cultured. Genes from bacterial secondary metabolism are usually clustered, which can simplify their cloning and transfer into a heterologous host. Heterologous expression in a culturable bacterium could therefore generate renewable sources of rare symbiont-derived drug candidates isolated from invertebrates.

The highly active antitumor compounds of the pederin group (FIG. 1) represents the strongest evidence the for such a bacterial biosyn thesis (Narquizian, R., Kocienski, P. J., The pederin family of antitumor agents: structures, synthesis and biological activity. The Role of Natural Products in Drug Discovery. Ernst Schering Research Foundation Workshop Series, 32, Springer-Verlag, Heidelberg, Germany, 25-56 (2000)). While almost all of these metabolites were isolated from marine sponges, pederin is, exclusively known from terrestrial Paederus and Paederidus beetles. These notorious insects carry pederin as vesicant deterrent in their hemolymph and cause severe dermatitis when accidentally crushed on the human skin. In all Paederus species studied so far, up to 90% of the contain high levels of pederin, and only these (+) females contain high levels of pedrin, offspring, Pederin free (−)-females do not produce (+)-offspring, unless they are fed eggs of (+)-females. This non-Mendelian mode of inheritance can be prevented if the (+)-eggs are previously, treated with antibiotecs, which strongly suggests bacterially mediated pederin biosynthesis. In stark contrast to the large number of suspected symbiont drug candidates from marine invertebrates, pederin is the only terrestrial example known to date.

The structure of pedrin and early labelling studies suggest that the metabolite is largely synthesised from malonyl- and methymalonyl-coenzyme A (CoA) units by a type I polyketide synthase (PKS). Such megasynthases consist of repeated modules, along which the growing polyketide chain is processed in an assembly line-like fashion. Normally, each module minimally carries ketosynthase (KS), acyltransferase (AT) and acyl carrier protein (ACP) domains to perform exactly one chain elongation cycle, and optional additional domains to catalyze further modifications.

As pointed out above, drug development from natural sources is commonly hampered by low yields and the difficulty of sustaining invertebrate cultures. To obtain insight into the true producer and to find alternative sources for these rare drug candidates, it was the object of the present invention to establish a way to provide pederin in a more convenient and economic fashion.

This object was solved by cloning, sequencing and analysing the pederin genes.

In a first aspect, the present invention provides an isolated nucleic acid comprising a pederin biosynthetic gene cluster or being complementary to a sequence comprising a pederin biosynthetic gene cluster. This cluster represents the first example of genes from an unculturable symbiont encoding the biosynthesis of a drug candidate.

This gene cluster is preferably derived from Paederus or Paederidus rove beetles, and in particular from a bacterial symbiont of Paederus or Paederidus rove beetles.

The isolated nucleic acid preferably comprises nucleic acid fragments forming individual units and/or modules of the pederin biosynthetic gene cluster as it is shown in more detail in FIG. 3. As depicted in FIG. 3, the cluster contains the units pedA to pedH which are either coding units for individual enzymes or for one or several polyketide synthase or nonribosomal peptide synthetase modules. The pedF unit, for example, comprises five polyketide synthase modules and one peptide synthetase module and/or the pedH unit comprises four polyketide synthase modules and one peptide synthetase module. Each polyketide synthase module comprises at least one ketosynthase domain and one acyl carrier protein domain. The isolated nucleic acid preferably comprises one or more of the pedA to pedH units essentially consisting of the nucleic acid sequences encoding the protein sequences shown in SEQ ID NO: 2 to 9.

In a particularly preferred embodiment, the isolated nucleic acid according to the present invention comprises:

-   -   a nucleotide sequence as shown in SEQ ID NO:1; or     -   a nucleotide sequence which is the complement of SEQ ID NO:1; or     -   a nucleotide sequence hybridising under highly stringent         conditions to SEQ ID NO:1 or to the complement thereof; or     -   a nucleotide sequence having at least 80% sequence identity with         SEQ ID NO:1 or with the complement thereof.

Under a further aspect, the present invention is directed to nucleic acid fragments selected from the group consisting of pedA, pedB, pedC, pedD, pedE, pedF, pedG and/or pedH as shown in FIG. 3. Especially important are the fragments essentially consisting of pedF and/or pedH. Further preferred are the nucleic acid fragments comprising one or more nucleotide sequences encoding the protein sequences as shown in SEQ ID NOs: 2 to 9. Also preferred are the corresponding parts of the nucleotide sequence SEQ ID NO:1.

Furthermore, the invention is directed to a polypeptide encoded by a nucleic acid as described above. The polypeptide preferably has functional activity in the synthesis of pederin and/or a polyketide and/or a peptide synthetase moiety.

In addition, the invention also provides a vector comprising a nucleic acid consisting essentially of the pederin biosynthetic gene cluster or a vector comprising a nucleic acid as described above as well as a recombinant host cell or a transgenic organism comprising said nucleic acid or containing said vector. In a preferred embodiment, the host cell used is a bacterial cell. As bacterial cells, Pseudomonas, Acinetobacter, Bacillus or Streptomyces cells are particularly preferred.

Finally, a method for producing pederin using a recombinant host cell or a transgenic organism as described above is provided, comprising the steps of:

-   -   culturing the recombinant host cell under conditions to express         the pederin biosynthetic gene cluster; and     -   isolating the produced pederin.

The inventive nucleic acids can be used in the preparation of a modified pederin biosynthesis gene cluster or in the preparation of a modified pederin molecule. Modified pederin molecules might be used as an alternative antitumor agent and might be even more potent antitumor agents as the original pederin.

In the following, reference is made to the figures further illustrating the present invention.

FIG. 1 shows some members of the pederin family of antitumor compounds isolated from terrestrial beetles and marine sponges.

FIG. 2 illustrates the PCR amplification of PKS gene fragments from beetle total DNA. a, TLC analysis of the ethanolic beetle extracts. The arrow indicates the position of pederin. Small amounts of pederin found in males and (−)-females are due to pederin transfer from (+)-females to the offspring. b, Agarose gel of the PCR products obtained from total DNA of the same beetle specimens that were used for pederin extraction. L, 1 kb DNA ladder; B, blind PCR control without template DNA; I Paederus fuscipes, (+)-female, collected at Jena, Germany; 2 P. fuscipes, (−)-female, Jena; P. fuscipes, male, Jena; 4, P. fuscipes, (+)-female, collected at Aydin, Turkey; 5, P. fuscipes, (−)-female, Aydin; 6, P. fuscipes, male, Aydin; 7, Paederidus rubrothracicus, (+)-female, Aydin; 8, Pd. rubrothoracicus, (−)-female, Aydin; 9, Pd. rubrothoracicus, male, Aydin; 10, Paederus litoralis, (+)-female, Jena; 11, P. litoralis, male, Jena. No (−)-female of P. litoralis was available.

FIG. 3 is a map of the sequenced ped genes and the proposed pederin biosynthesis pathway. MT, methyltransferase, OR, oxidoreductase, AT, acyltransferase; KS, ketosynthase domain; ACP, acyl carrier protein domain; KR, ketoreductase domain; DH, dehydratase domain; (DH), putative nonfunctional dehydratase domain, OXY, oxygenase; C, nonribosomal peptide synthetase condensation domain; A, nonribosomal peptide synthetase adenylation domain; T, nonribosomal peptide synthetase thiolation domain, TE, thioesterase domain.

FIG. 4 shows a comparison of the first modules of PedF and the first module of the avermectin AVES2 protein which is a regular type I polyketide synthase. The shaded area in AVES2-1 indicates the AT region that is deleted in PedF1. Conserved motifs are shown as vertical bars. Motifs are: 1, EPIAIV SEQ ID NO: 20; 2, DPQQRL SEQ ID NO: 21; 3, CSSS SEQ ID NO: 22; 4, HGTGTXLGD SEQ ID NO: 23; 5, GxGGxNAHVILEE SEQ ID NO: 24; 6, YTL; 7, GHSxG SEQ ID NO: 25; 8, YPF; 9, GxDS SEQ ID NO: 26.

The sequences mentioned in this application are listed in the attached sequence listing. These sequences are shortly summarized in the following:

SEQ ID NO:1: nucleic acid sequence of the pederin biosynthetic gene cluster

SEQ ID NO:2: protein sequence of PedA putative methytransferase

SEQ ID NO:3: protein sequence of PedB putative FMN-dependent oxidoreductase

SEQ ID NO:4: protein sequence of PedC putative acetyltransferase

SEQ ID NO:5: protein sequence of PedD putative acetyltransferase

SEQ ID NO:6: protein sequence of PedE putative methyltransferase

SEQ ID NO:7: protein sequence of PedF mixed type I polyketide synthase/nonribosomal peptide synthase (module 1 PKS (KS-ACP), module 2 NRPS (C-A-T), module 3 PKS (KS-KR-ACP), module 4 PKS (KS-KR-MT-ACP), module 5 PKS (KS-KR-DH-DH-ACP), module 6 PKS (KS-KR-ACP), module 7 incomplete PKS (KS-DH)) SEQ ID NO:8: protein sequence of PedG putative flavin-binding monooxygenase SEQ ID NO:9: protein sequence of PedH mixed type I polyketide synthase/nonribosomal peptide synthase (module 1 incomplete PKS (ACP), module 2 PKS (KS-DH-KR-ACP), KS-DH-ACP), module 3 PKS (KS-DH-ACP), module 4 PKS (KS-KR-ACP), module-5 PKS (KS-ACP), module 6 NRPS (C-A-T-TE)) SEQ ID ND:10 and SEQ ID NO:11: nucleic acid sequences of degenerate primers used during the cloning of the ped genes SEQ ID NO:12 to SEQ ID NO:19: nucleic acid sequences of primers used during the cloning of the ped genes

The present invention will now be described in further detail with reference to a particular example.

According to the example of the present invention, pederin biosynthesis genes were cloned from total DNA of Paederus fuscipes beetles, which use this compound for chemical defense. Sequence analysis of the gene cluster and adjacent regions revealed the presence of open reading frames with typical bacterial architecture and homologies. The cluster is present only in female beetles with high pederin content and encodes a mixed modular polyketide synthase—nonribosomal peptide synthetase. Notably, none of the modules contains regions with homology to acyltransferase domains, but two copies of isolated monodomain acyltransferase genes were found at the upstream end of the cluster. This architecture suggests a novel mechanism of extender unit selection, distinct from previously described modular polyketide systems. The cluster represents the first example of cloned genes from an unculturable invertebrate symbiont that encodes the biosynthesis of a potential drug candidate.

To clone the pederin cluster, a PCR strategy was pursued involving degenerate primers based on universally conserved motifs of KS domains. Total DNA, isolated from different beetle specimens, was used as a PCR template. Analysis of three species of the genera Paederus and Paederidus collected at two different locations consistently revealed that only those adult beetles with a high pederin content gave the PCR product expected to arise from the presence of PKS genes (FIG. 2). Amplification products were also obtained by using eggs from (+)-females of a fourth species, P. riparius, from a third locality (data not shown). Sequencing of the amplified DNA showed, in all cases, the exclusive presence of the same group of three to four different sequences possessing strong homology to KS domains of bacterial type I PKSs. This is shown in the following Table 1.

TABLE 1 Comparison of DNA sequences obtained by PCR from different beetle specimens. P. fuscipes P. fuscipes P. litoralis P. riparius Paederidus adults adults adults eggs rubrothoracicus Sequence (Jena, (Aydin, (Jena, (Bayreuth, (Aydin, Highest name Germany) Turkey) Germany) Germany) Turkey) homology PKS1 + + + + + B. subtilis Pks (54%) PKS2 + + + + + B. subtilis Pks (67%) PKS3 + + + − + B. subtilis Pks (53%) PKS4 + − − + + B. subtilis Pks (44%) Note: + and − indicate present and absent sequences, respectively.

The perfect correlation between pederin content and PKS sequences, independent of species and locality of collection, suggests that the amplified fragments belong to different modules of the pederin cluster. These DNA fragments were therefore used to locate the cluster.

To this end, a metagenomic cosmid library was constructed from total DNA of P. fuscipes beetles. By screening this library with specific PCR primers derived from the amplified sequences (see Methods below), three positive cosmids were identified. Sequencing of a 52.7 kb region revealed the presence of ORFs homologous to type I PKS genes, designated as ped genes. All of the ampylified KS sequences could be found on these ORFs. Additional regions covering ca. 60 kb outside of the cluster were obtained on two cosmids isolated by chromosome walking and subjected to extensive spot sequencing. All putative genes present on these cosmids exhibit typical bacterial features: they are tightly packed, free of introns and polyadenylation sites, and preceded by Shine-Dalgarno patterns in appropriate distances to the start codons. Furthermore, when subjected to database homology searches, each of the translated ORFs exhibited the highest similarity to bacterial proteins. Among the homologies to 65 different ORFs analysed, 15 are exclusively known from prokaryotes, such as enzymes used in vitamin B₁₂ biosynthesis, type II fatty acid synthase components and regulatory proteins of the LuxR and LysR families. From these findings the inventor concluded that the ped cluster is located on a bacterial genome.

FIG. 3 summarizes the results from an analysis of the completely sequenced 52.7 kb region. The predicted gene products of the ORFs pedF and pedH are giant proteins of 8601 and 6266 amino acids (aa), respectively, resembling mixed modular PKSs/nonribosomal peptide synthetases (NRPSs). Strikingly, AT homologies are completely absent on these proteins. Alignment with other known type I PKSs revealed that a continuous ˜300 aa region of each AT domain, including the active site GHS motif, are deleted in pedF and pedH, with no other homologies replacing them (FIG. 4), which leads to a considerable shortening of each module. In modular PKSs, the AT domain is crucial in the selection of the correct acyl-CoA unit in each extension cycle, raising the interesting question how this process is controlled during assembly of pederin from two different acyl-CoA units. The ORFs pedC and pedD, located upstream of pedF and encoding deduced proteins with homology to monodomain ATs, could play a possible role in selectivity control. It is tempting to speculate that each of these isolated ATs recognises a different extension unit and is bound and used iteratively by cognate PKS modules. The ped cluster would then encode a type I protein complemented by repetitive type II enzymes. Intriguingly, a similar PKS system containing such putative “super ATs” is encoded on the genome of Bacillus subtilis. The gene products of the pks cluster consist of a large number of PKS modules without AT domains and, encoded at the upstream end of the cluster, three isolated ATs. The secondary metabolite generated by these proteins is not known. Sequence comparison with other known PKS clusters reveals that the ped and the B. subtilis pks clusters are more closely related to each other than to any other known PKS cluster. The two clusters could therefore belong to a phylogenetically distinct subgroup of functionally novel type I PKS systems.

With few exceptions, the order of encoded modules in type I PKS clusters strictly correlates with the sequence of biosynthetic steps. In most cases the core structure of the metabolite can therefore be predicted from the gene architecture and vice versa. Except for the missing first three modules, which could not be found in the sequenced region, the pederin structure is perfectly mirrored in pedF (FIG. 3). The single ORF should thus be responsible for the formation of the largest part of the pederin molecule except for the six-membered ring bearing the exomethylene group. Characteristic features of the cluster include (i) O-methyltransferase and oxygenase tailoring genes upstream and downstream of pedF, most likely involved in the final biosynthetic steps, (ii) a rare methyltransferase domain in PedF that should catalyse the formation of the uncommon geminal dimethyl group, as known from epothilone biosynthesis, (iii) a repeated dehydratase domain in PedF that putatively performs a sequential elimination of water and intramolecular alcohol attack to generate the dimethylated tetrahydropyrane ring and (iv) a NRPS module in PedF that likely incorporates an amino acid to account for the amide bond. Previous studies have identified sequence patterns of NRPS activating domains that can be used to predict the amino acid incorporated by the corresponding module. The structure of pederin suggests that glycine is selected by the NRPS module of PedF. In accordance, a pattern analysis of the PedF NRPS module revealed 100% identity to the known consensus nonribosomal code for glycine.

Analysis of the modular architecture of the pederin cluster as shown in FIG. 3 further suggests that the biosynthesis of pederin proceeds via a larger intermediate that is cleaved oxidatively into two fragments. One of these fragments would be converted by further biosynthetic steps to pederin. The intermediate presumably is very similar to the longer-chain compounds derived from sponges like onnamide and icadamide B. Manipulation of the pederin cluster (e.g. by inactivating the cleaving oxygenase gene) should therefore also enable access to analogues of these marine drugs.

Taken together, these findings and the fact that the ped genes contained all of the amplified KS fragments and only occur in beetles with high pederin content, independent of species and geography, are compelling evidence that the cloned gene cluster indeed encodes pederin biosynthesis.

The present invention showed for the first time that genes responsible for the biosynthesis of rare invertebrate drug candidates can be cloned from unculturable bacterial symbionts. Since whole sets of type I PKS genes have been functionally expressed even in E. coli, a similar production of “invertebrate” natural products in a suitable host has now become a realistic scenario. The novel structure of the ped cluster, featuring small-sized modules and ATs with putatively high catalytic flexibility, furthermore offers fascinating possibilities for generating unnatural drug analogues by combinatorial biosynthesis.

Methods.

Pederin analysis. Beetle species were determined by R. 5L. L. Kellner and H. Baspinar. Beetles were stored individually in ethanol immediately after collection to preserve the DNA. For pederin analysis the ethanol was concentrated to 50 μl, and 10 μl were spotted on a silica gel TLC plate (Merck). The plate was then developed in ethyl acetate and stained with anisaldehyde reagent. A pink spot at R_(F)=0.22 was specific for pederin.

Cloning of the ped genes. A QIAAMP DNA™ mini kit (Qiagen) was used to extract DNA from adult beetles with known pederin content. This DNA was used as PCR template. For egg DNA templates, one egg was ground in PCR buffer at ° C. using a Wheaton homogenizer (previously treated with concentrated HCl for 15 min and washed with sterile H₂O), transferred into a PCR tube, frozen and thawed three times and subsequently boiled for 5 min, then the remaining PCR components were added. For all initial reactions, the primers KSDPQQF (5′-MGNGARGCNNWNSMNATGGAYC CNCARCANMG-3′) (SEQ ID NO:10) and KSHGTGR (5′-GGRTCNCCNARNSWN GTNCCNGTNCC-RTG-3′) (SEQ ID NO:11) and Platinum Taq DNA Polymerase-High Fidelity (Invitrogen) were used (M=A+C, R=A+G, W=A+T, S=C+G, Y=C+T, N=A+T+C+G). Each PCR experiment was performed at least in triplicate except for the rarer P. litoralis adults, where only two reactions could be run for each sex. PCR products were ligated into the pGEM-T Easy vector (Promega) and digested with RsaI. Plasmids showing a unique restriction pattern were sequenced using the BigDye Terminator Ready Mix (Applied Biosystems) and an ABI 3700 sequencer (Applied Biosystems). From these sequences the following primer pairs specific for single modules were designed: 5′-TGGCATCGT GGGGAAAGGCTG-3′ (SEQ ID NO:12)-5′-GGCGCAGGTGCTGACACGC-3′ (SEQ ID NO:13) (KSLF-KSIR), 5′-TTAGCCATCGAGAGTTACAGCTC-3′ (SEQ ID NO:14)-5′-AATCGCCGATAGCCATCGCCG-3′ (SEQ ID NO:15) (KS2F-KS2R), 5′-GACGCCATGGATGCACTGCAC-3′ (SEQ ID NO:16)-5′-TATTGGATGCTCAG CACCGCAC-3′ (SEQ ID NO:17) (KS3F-KS3R) and 5′-GGGCTCAGTTTCCACC CTTATG-3′ (SEQ ID NO:18)-5′-CCGGCGCTGCAGAGCCAGG-3′ (SEQ ID NO:19) (KS4F-KS4R). As cosmid library was prepared from total DNA of 10 P. fuscipes (+)-females collected in Aydin, Turkey, using the pWEB cosmid cloning kit (Epicentre). The library was plated at concentrations to yield about 300 colonies per plate. The bacteria from each plate were combined, and the complete plasmid DNA isolated from 12 plate pools was screened by diagnostic PCR using the specific primers. Positive pools were plated at numbers of 50 per plate and rescreened. This procedure was repeated until single positive colonies could be identified. Positive cosmids were sonicated, end-repaired by BAL-31 and Klenow fragment and size-fractionated by gel electrophoresis to yield fragments of 1-2 kb lengths. These fragments were ligated into the EcoRV site of pBluescript II K (Stratagene) and end sequenced. Remaining gaps were filled by using specifically designed primers and by targeted subcloning. Sequence analysis was performed by using BLASTX, PROSITE, FRAMEPLOT and the Lasergene DNASTAR software package. 

1. An isolated nucleic acid comprising a pederin biosynthetic gene cluster or being fully complementary to a sequence comprising a pederin biosynthetic gene cluster, comprising a pedF nucleotide sequence, wherein the pedF nucleotide sequence is selected from the group consisting of nucleotides 6309-32114 of SEQ ID NO: 1 and fully complementary sequence of nucleotides 6309-32114 of SEQ ID NO:
 1. 2. The isolated nucleic acid according to claim 1 which is derived from a bacterial symbiont of Paederus or Paederidus rove beetles.
 3. The isolated nucleic acid according to claim 1, wherein the pedF encodes a protein sequence as shown in SEQ ID NO:7.
 4. The isolated nucleic acid according to claim 1, comprising a nucleotide sequence selected from the group consisting of SEQ ID NO:1, and a fully complementary nucleotide sequence, wherein the complementary nucleotide sequence is the complement of SEQ ID NO:1.
 5. A vector comprising the nucleic acid comprising the pederin biosynthetic gene cluster of claim
 1. 6. A vector comprising the nucleic acid according to claim
 1. 7. A recombinant host cell or a transgenic organism comprising the nucleic acid according to claim
 1. 8. A recombinant host cell according to claim 7 which is a bacterial cell.
 9. A method for producing pederin using a recombinant host cell or a transgenic organism according to claim 7 comprising the steps of: culturing the recombinant host cell or the transgenic organism under conditions to express the pederin biosynthetic gene cluster; and isolating the produced pederin.
 10. The recombinant host cell according to claim 8 wherein the bacterial cell is a Pseudomonas, Acinetobacter, Bacillus or Streptomyces cell. 